Error in UseMethod("left_join") : 
  no applicable method for 'left_join' applied to an object of class "c('SpatialPolygonsDataFrame', 'SpatialPolygons', 'Spatial')"

First need to join up the real property data (Open Baltimore) the sales data (provided by Steve, and with deed dates from January 1, 2010 through October 2018) so we have a neighborhood for as many sales as we can.

1,153 sales didn’t match to a block-lot in the real property table, which means that the block-lot jointly was not in the real prop table.

Also, there are about 16,000 properties in the real prop table that don’t have a neighborhood.

So after joining we end up with 9,428 sales that don’t have a neighborhood.

The real property table also gives if it is principal residence or not, so we’ll also filter for the sales that are for principal residences.

Distribution of city-wide 2018 sales prices:

   85% 
275000 

Include Permits

Neighborhood Summary Table, 2015-2017

We have 16128 samples to work with that are in 2015-2017, have a neighborhood, were an arms-length sale, and are the principal residence.

Which neighborhoods have less than 20 sales meeting the criteria?

84 neighborhoods have less than 20 sales meeting the criteria. We’ll exclude them going forward so we have a reasonable sample size.

98th Percentile

Criteria & Results

There are 170 sales that meet the following criteria:

  • Deed date was between January 1, 2018 and October 5, 2018
  • Arms-length sale
  • Principal residence
  • Neighborhood had at least 20 sales
  • 98th percentile for sales prices for their neighborhood.

(If this yield isn’t high enough we can bump it down to the 95th percentile.)

Map

Full list

Detect jumps

99th Percentile

Criteria & Results

There are 170 sales that meet the following criteria:

  • Deed date was between January 1, 2018 and October 5, 2018
  • Arms-length sale
  • Principal residence
  • Neighborhood had at least 20 sales
  • 99th percentile for sales prices for their neighborhood.

(If this yield isn’t high enough we can bump it down to the 95th percentile.)

In Middle Neighborhood, 99th Percentile for Neighborhood, Over $250k, Over $10k Permit Activity

The following criteria are used below:

[1] 46

Further filter for permit value totals over $10,000.

Results in 27 properties.

[1] 28

MRIS data Nov 2018 - February 2019

---
title: "Recent Sales Outliers"
author: "Justin Elszasz, Mayor's Office of Innovation"
email: "justin.elszasz@baltimorecity.gov"
date: "Thursday, February 28, 2019"
output:
  html_notebook:
    code_folding: hide
    fig_height: 5
    fig_width: 10
    toc: yes
    toc_depth: 2
---

```{r setup, include = FALSE, echo = FALSE, message = FALSE, cache = TRUE}
knitr::opts_chunk$set(echo = FALSE, warning = F, message = F, include = T,
                      fig.width = 10, fig.height = 5, 
                      out.width = "100%", out.height = "100%")
```


```{r}
source("../src/00_initialize.R")
sales <- load_sales_data(load.cache = T)

library(readxl)
library(RSocrata)
library(sp)
library(leaflet)
library(RODBC)
library(htmltools)

real.prop.url <- "https://data.baltimorecity.gov/resource/6act-qzuy.json"
real.prop <- read.socrata(real.prop.url, app_token = VARS$SOCRATA_TOKEN)

hoods <- get_neighborhood_boundaries()
hmt <- load_block_group_data(load.cache = T)
hmt.hood <- read_excel("../data/raw/hmt/HMT by Neighborhood 2017.xlsx")

conn.gis <- odbcDriverConnect(
  paste0(
    'driver={ODBC Driver 13 for SQL Server};',
    'server=', VARS$EGIS_SERVER, 
    ';uid=', VARS$EGIS_SERVER_USER,
    ';pwd=',VARS$EGIS_SERVER_PWD, 
    ';database=housing;trusted_connection=No')
)

permits <- sqlFetch(conn.gis, "housing.tbl_Permit")

mris.recent <- read_excel("../data/raw/sales/Nov 2018 - Feb 2019 sales for Justin.xlsx")
```

```{r}
# join HMT hood labels to hood boundaries

hoods@data <- hoods@data %>% left_join(hmt.hood, by = c("label" = "Neighborhood"))
```


First need to join up the real property data ([Open Baltimore]([http://data.baltimorecity.gov/Financial/Real-Property-Taxes/27w9-urtvto)) the sales data (provided by Steve, and with **deed dates from January 1, 2010 through October 2018**) so we have a neighborhood for as many sales as we can.

```{r}
sales <- sales %>% rename(sales.block = Block, sales.lot = Lot)
real.prop <- real.prop %>% rename(real.block = block, real.lot = lot)
```

```{r}
real.prop <- real.prop %>%
  mutate(real.block.clean = gsub("^0+", "", real.block),
         real.lot.clean = gsub("^0+", "", real.lot))

sales <- sales %>%
  mutate(sales.block.clean = gsub("^0+", "", sales.block),
         sales.lot.clean = gsub("^0+", "", sales.lot))
```

```{r}
sales <- sales %>%
  left_join(real.prop, 
            by = c("sales.block.clean" = "real.block.clean",
                   "sales.lot.clean" = "real.lot.clean")
            )
```

```{r}
sales %>% count(is.na(real.block), is.na(real.lot))
```

1,153 sales didn't match to a block-lot in the real property table, which means that the block-lot jointly was not in the real prop table. 

Also, there are about 16,000 properties in the real prop table that don't have a neighborhood. 

```{r}
real.prop %>% count(is.na(neighborhood))
```

So after joining we end up with 9,428 sales that don't have a neighborhood.

```{r}
sales %>% count(!is.na(neighborhood))
```

The real property table also gives if it is principal residence or not, so we'll also filter for the sales that are for principal residences.

```{r}
sales %>% count(rescode)
```

Distribution of city-wide 2018 sales prices:

```{r}
sales %>%
  filter(year(deed.date) == 2018) %>%
  ggplot(aes(`Sales Price`)) +
  geom_histogram() +
  theme_iteam_google_docs() +
  xlim(c(0, 500000))
```

```{r}
quantile(sales$`Sales Price`, 0.85)
```
 
## Include Permits 

```{r}
permits %>% count(csm_status)
```

```{r}
permits %>% count(csm_type_work)
```

# Neighborhood Summary Table, 2015-2017

```{r}
meet.criteria <- sales %>%
  filter(year(deed.date) %in% c(2015, 2016, 2017),
         !is.na(neighborhood),
         `How Conveyed` == 1,
         !grepl("NOT", rescode)) %>%
  nrow
```

We have `r meet.criteria` samples to work with that are in 2015-2017, have a neighborhood, were an arms-length sale, and are the principal residence.

```{r}
sales.summary.15_17.by.hood <- sales %>%
  filter(year(deed.date) %in% c(2015, 2016, 2017),
         !is.na(neighborhood),
         `How Conveyed` == 1,
         !grepl("NOT", rescode)) %>%
  group_by(neighborhood) %>%
  summarise(hood.n = n(),
            hood.mean = mean(`Sales Price`),
            hood.median = median(`Sales Price`),
            hood.std = sqrt(sum((`Sales Price`-hood.mean)^2/(hood.n-1))),
            hood.95th = quantile(`Sales Price`, probs = .95),
            hood.98th = quantile(`Sales Price`, probs = .98),
            hood.99th = quantile(`Sales Price`, probs = .99))

sales.summary.15_17.by.hood  
```

Which neighborhoods have less than 20 sales meeting the criteria?

```{r}
sales.summary.15_17.by.hood %>%
  filter(hood.n < 20)
```

84 neighborhoods have less than 20 sales meeting the criteria. We'll exclude them going forward so we have a reasonable sample size.

```{r}
# Join the summaries to the neighborhood boundaries
hoods@data <- hoods@data %>% 
  left_join(sales.summary.15_17.by.hood,
            by = c("label" = "neighborhood"))
```


# 98th Percentile

## Criteria & Results

```{r}
sales.hood.98th <- sales %>%
  left_join(sales.summary.15_17.by.hood,
            by = c("neighborhood" = "neighborhood")) %>%
  filter(year(deed.date) == 2018,
         hood.n >= 20,
         `Sales Price` >= hood.98th,
         `How Conveyed` == 1,
         !grepl("NOT", rescode)) %>%
  arrange(neighborhood)

result.sales <- nrow(sales.hood.98th)
```

**There are `r result.sales` sales that meet the following criteria:**

- Deed date was between January 1, 2018 and October 5, 2018
- Arms-length sale
- Principal residence
- Neighborhood had at least 20 sales
- 98th percentile for sales prices for their neighborhood.

(If this yield isn't high enough we can bump it down to the 95th percentile.)


```{r}
sales.hood.98th$long <- lapply(sales.hood.98th$location.coordinates, function(x) x[1]) %>% unlist()

sales.hood.98th$lat <- lapply(sales.hood.98th$location.coordinates, function(x) x[2]) %>% unlist()
```


```{r}
sales.hood.98th.geo <- sales.hood.98th %>% filter(!is.na(long))
  
sales.hood.98th.geo <- SpatialPointsDataFrame(
  sales.hood.98th.geo %>% select(long, lat), 
  sales.hood.98th.geo,
  proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))

sales.hood.98th.geo <- 
  spTransform(
    sales.hood.98th.geo, 
    CRSobj = CRS("+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")
    )

```

## Map

```{r}
library(htmltools)

hoods.labels <- paste0(
  hoods$label,
  "<br>2017 Housing Market Typology: ", hoods$`Predominant Code Ignoring Non-Residential`,
  "<br>Median Sales, 2015-2017: ", as.character(hoods$hood.median)
  
)

sale.labels <- paste0(
  sales.hood.98th.geo$`House #`, " ",
  sales.hood.98th.geo$`Street Name`, " ",
  sales.hood.98th.geo$Suffix, 
  "<br>Sale Price in 2018: ", 
  as.character(sales.hood.98th.geo$`Sales Price`),
  "<br>New Owner: ", sales.hood.98th.geo$new.owner
)


leaflet() %>%
  setView(lng = -76.6, lat = 39.3, zoom = 11) %>%
  addProviderTiles(providers$Stamen.TonerLite) %>% 
  addPolygons(data = hoods, 
              weight = 2, 
              color = "black",
              opacity = 0.5,
              fillOpacity = 0, 
              label = ~lapply(hoods.labels, HTML)) %>%
  addCircleMarkers(data = sales.hood.98th.geo, 
                   radius = 2,
                   label = ~lapply(sale.labels, HTML))
```

## Full list

```{r}
sales.hood.98th 
```

# Detect jumps

```{r}
#look for temporal jumps in prices
```

# 99th Percentile

## Criteria & Results

```{r}
sales.hood.99th <- sales %>%
  left_join(sales.summary.15_17.by.hood,
            by = c("neighborhood" = "neighborhood")) %>%
  filter(year(deed.date) == 2018,
         hood.n >= 20,
         `Sales Price` >= hood.99th,
         `How Conveyed` == 1,
         !grepl("NOT", rescode)) %>%
  arrange(neighborhood)

result.sales <- nrow(sales.hood.99th)
```

**There are `r result.sales` sales that meet the following criteria:**

- Deed date was between January 1, 2018 and October 5, 2018
- Arms-length sale
- Principal residence
- Neighborhood had at least 20 sales
- 99th percentile for sales prices for their neighborhood.

(If this yield isn't high enough we can bump it down to the 95th percentile.)


```{r}
sales.hood.99th$long <- lapply(sales.hood.99th$location.coordinates, function(x) x[1]) %>% unlist()

sales.hood.99th$lat <- lapply(sales.hood.99th$location.coordinates, function(x) x[2]) %>% unlist()
```


```{r}
sales.hood.99th.geo <- sales.hood.99th %>% filter(!is.na(long))
  
sales.hood.99th.geo <- SpatialPointsDataFrame(
  sales.hood.99th.geo %>% select(long, lat), 
  sales.hood.99th.geo,
  proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))

sales.hood.99th.geo <- 
  spTransform(
    sales.hood.99th.geo, 
    CRSobj = CRS("+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")
    )

```

```{r}

sale.labels <- paste0(
  sales.hood.99th.geo$`House #`, " ",
  sales.hood.99th.geo$`Street Name`, " ",
  sales.hood.99th.geo$Suffix, 
  "<br>Sale Price in 2018: ", 
  as.character(sales.hood.99th.geo$`Sales Price`),
  "<br>New Owner: ", sales.hood.99th.geo$new.owner
)


leaflet() %>%
  setView(lng = -76.6, lat = 39.3, zoom = 11) %>%
  addProviderTiles(providers$Stamen.TonerLite) %>% 
  addPolygons(data = hoods, 
              weight = 2, 
              color = "black",
              opacity = 0.5,
              fillOpacity = 0, 
              label = ~lapply(hoods.labels, HTML)) %>%
  addCircleMarkers(data = sales.hood.99th.geo, 
                   radius = 2,
                   label = ~lapply(sale.labels, HTML))
```

```{r}
emily.list <- c("4011 BARRINGTON",
                "DORCHESTER",
                "2319 MONTICELLO")

sales.hood.98th.geo@data %>% 
  filter(grepl(paste(emily.list, collapse="|"), propertyaddress))
```

# In Middle Neighborhood, 99th Percentile for Neighborhood, Over $250k, Over $10k Permit Activity

The following criteria are used below:

- Deed date was between January 1, 2018 and October 5, 2018
- Arms-length sale
- Principal residence
- Neighborhood had at least 20 sales
- 99th percentile for sales prices for their neighborhood.
- Sale price over $250,000
- Permits issued between 2017-2018
- Total permit value at least $10,000

```{r}
permits.recent <- permits %>% 
  filter(csm_issued_date >= "2017-01-01") %>%
  mutate(permit.block.clean = gsub("^0+", "", prc_block_no),
         permit.lot.clean = gsub("^0+", "", prc_lot))

permits.recent.summary <- permits.recent %>%
  group_by(permit.block.clean, permit.lot.clean) %>%
  summarise(permit.count = n(),
            permit.total.value = sum(csm_cost, na.rm = T))
```

```{r}
mid.hoods <- hmt.hood %>% filter(`Predominant Code Ignoring Non-Residential` %in% c("D", "E", "F", "G", "H"))

sales.99th.mid.hood <- subset(sales.hood.99th.geo, tolower(neighborhood) %in% tolower(mid.hoods$Neighborhood))

sales.99th.mid.hood.over.250k <- subset(sales.99th.mid.hood,
                                        `Sales Price` > 250000)
```

```{r}
sales.99th.mid.hood.over.250k@data %>% nrow
```


```{r}
sales.99th.mid.hood.over.250k@data <- sales.99th.mid.hood.over.250k@data %>%
  left_join(permits.recent.summary, 
            by = c("sales.block.clean" = "permit.block.clean",
                   "sales.lot.clean" = "permit.lot.clean"))
```

Further filter for permit value totals over $10,000.

```{r}
sales.99th.mid.hood.over.250k.10k.permit <- subset(sales.99th.mid.hood.over.250k, permit.total.value >= 10000)
                                                   
```

Results in 27 properties.

```{r}
sales.99th.mid.hood.over.250k.10k.permit@data %>% nrow
```


```{r out.width="100%", width = 8}

mid.hoods.geo <- subset(hoods, 
                        tolower(label) %in% tolower(mid.hoods$Neighborhood))



mid.hoods.labels <- paste0(
  mid.hoods.geo$label,
  "<br>2017 Housing Market Typology: ", mid.hoods.geo$`Predominant Code Ignoring Non-Residential`,
  "<br>Median Sales, 2015-2017: ", as.character(mid.hoods.geo$hood.median)
  
)

sale.labels <- paste0(
  "Analysis: SDAT (through October 18)<br>",
  sales.99th.mid.hood.over.250k.10k.permit$`House #`, " ",
  sales.99th.mid.hood.over.250k.10k.permit$`Street Name`, " ",
  sales.99th.mid.hood.over.250k.10k.permit$Suffix, 
  "<br>Sale Price in 2018: ", 
  as.character(sales.99th.mid.hood.over.250k.10k.permit$`Sales Price`),
  "<br>New Owner: ", sales.99th.mid.hood.over.250k.10k.permit$new.owner,
  "<br>Permits Issued from 2017-2018: ", sales.99th.mid.hood.over.250k.10k.permit$permit.count,
  "<br>Total Permit Value from 2017-2018: ", sales.99th.mid.hood.over.250k.10k.permit$permit.total.value
)


leaflet() %>%
  setView(lng = -76.6, lat = 39.3, zoom = 11) %>%
  addProviderTiles(providers$Stamen.TonerLite) %>% 
  addPolygons(data = hoods, 
              weight = 2, 
              
              color = "black",
              opacity = 0.5,
              fillOpacity = 0, 
              label = ~lapply(hoods.labels, HTML)) %>%
  addPolygons(data = mid.hoods.geo, 
              weight = 2, 
              #color = "black",
              opacity = 0.0,
              fillOpacity = .2,
              fillColor = iteam.colors[3],
              label = ~lapply(mid.hoods.labels, HTML)) %>%
  addCircleMarkers(data = sales.99th.mid.hood.over.250k.10k.permit, 
                   color = iteam.colors[1],
                   opacity = 1,
                   radius = 2,
                   label = ~lapply(sale.labels, HTML))
```


```{r}
sales.99th.mid.hood.over.250k.10k.permit@data
```

# MRIS data Nov 2018 - February 2019

```{r}
library(ggmap)
register_google(Sys.getenv("GOOGLE_GEO_KEY"))
```

### List that Julie & Justin found during dashboard search

```{r}
julie.list <- read_excel("../data/raw/sales/Nov 2018 - Feb 2019 sales for Justin.xlsx", sheet = 2)
```

```{r}
julie.list.geocoded <- julie.list %>%
  mutate(full.address = paste0(Address, ", Baltimore, MD, ", Zip_Code)) %>%
  mutate_geocode(full.address)

julie.list.geocoded

```

```{r}
filename <- "../data/processed/MRIS_outliers_geocoded.rds"

if (file.exists(filename) == F){
  
  mris.recent.outliers <- mris.recent %>%
    left_join(sales.summary.15_17.by.hood,
              by = c("Neighborhood" = "neighborhood")) %>%
    filter(hood.n >= 20,
           Close_Price >= hood.99th) %>%
    arrange(Neighborhood) %>%
    mutate(full.address = paste0(Address, ", Baltimore, MD, ", Zip_Code)) %>%
    mutate_geocode(full.address)
  
  saveRDS(mris.recent.outliers, filename)
  
}

mris.recent.outliers <- readRDS(filename)

mris.recent.outliers.reno <- mris.recent.outliers %>% filter(!is.na(Year_Renovated))
```

```{r}
mris.sale.labels <- paste0(
  "Analysis: MRIS (Nov 18 - Feb 19)<br>",
  mris.recent.outliers.reno$Address,
  "<br>Close Date: ", mris.recent.outliers.reno$Close_Date,
  "<br>Sale Price in 2018: ", 
  as.character(mris.recent.outliers.reno$Close_Price),
  "<br>Recent Renovation: ", mris.recent.outliers.reno$Year_Renovated
)

julie.list.labels <- paste0(
  "Julie/Justin List<br>",
  julie.list.geocoded$Address,
  "<br>Close Date: ", julie.list.geocoded$Close_Date,
  "<br>Sale Price in 2018: ", 
  as.character(julie.list.geocoded$Close_Price),
  "<br>Recent Renovation: ", julie.list.geocoded$Year_Renovated
)


leaflet(width = "100%") %>%
  setView(lng = -76.6, lat = 39.3, zoom = 11) %>%
  addProviderTiles(providers$Stamen.TonerLite) %>% 
  addPolygons(data = hoods, 
              weight = 2, 
              color = "black",
              opacity = 0.5,
              fillOpacity = 0, 
              label = ~lapply(hoods.labels, HTML)) %>%
  addPolygons(data = mid.hoods.geo, 
              weight = 2, 
              #color = "black",
              opacity = 0.0,
              fillOpacity = .2,
              fillColor = iteam.colors[3],
              label = ~lapply(mid.hoods.labels, HTML)) %>%
      addCircleMarkers(data = julie.list.geocoded,
color = iteam.colors[3],
                   opacity = 1,
                   radius = 2,
                   label = ~lapply(julie.list.labels, HTML)) %>%
  addCircleMarkers(data = mris.recent.outliers.reno,

                   color = "red",
                   opacity = 1,
                   radius = 2,
                   label = ~lapply(mris.sale.labels, HTML)) %>%
  addCircleMarkers(data = sales.99th.mid.hood.over.250k.10k.permit, 
                   color = iteam.colors[1],
                   opacity = 1,
                   radius = 2,
                   label = ~lapply(sale.labels, HTML)) %>%
  addLegend(position = "bottomright",
            colors = c(iteam.colors[3], "red", iteam.colors[1]),
           labels = c("Justin/Julie LB Dashboard Search", 
                      "MRIS Recent (Nov 18 - Feb 19) + Recent Reno", 
                      "SDAT (2018 thru Oct) + 10k Permit"),
           opacity = 1)


```



























